Research Synthesis Methods
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
IntroductionWe investigated if large language models (LLMs) can be used for abstract screening in systematic- and scoping reviews. MethodsTwo broad reviews were designed: a systematic review structured according to the PRISMA guideline with abstract inclusion based on PICO criteria; and a scoping review, where we defined abstract characteristics and features of interest to look for. For both reviews 500 abstracts were sampled. Two readers independently screened abstracts with disagreements hand...
Show abstract
BackgroundThe Cochrane Collaboration has been publishing systematic reviews in the Cochrane Database of Systematic Reviews (CDSR) since 1995, with the intention that these be updated periodically. ObjectivesTo chart the long-term updating history of a cohort of Cochrane reviews and the impact on the number of included studies. MethodsThe status of a cohort of Cochrane reviews updated in 2003 was assessed at three time points: 2003, 2011, and 2018. We assessed their subject scope, compiled thei...
Show abstract
BackgroundSystematic literature reviews (SLRs) are essential for evidence synthesis but are hampered by the resource-intensive full-text screening phase. Loon Lens Pro, a publicly available agentic AI tool, automates full-text screening without prior training by using user-defined inclusion/exclusion criteria and multiple specialized AI agents. This study validated Loon Lens Pro against human reviewers to assess its accuracy, efficiency, and confidence scoring in screening. MethodsIn this compa...
Show abstract
IntroductionOverlap of primary studies among systematic reviews (SRs) included in an overview is a major challenge, as it may bias results or artificially increase the precision of the synthesis. Matrices of evidence and corrected covered area (CCA) calculation are recommended methods to manage overlap, but there is little guidance on how to construct these matrices. This research aims to explore variations in the estimation of overlap using CCA matrices under different assumptions. MethodsWe w...
Show abstract
ObjectivesTo quantify the amount and certainty of evidence in Cochrane systematic reviews of interventions, and to describe how this evidence has evolved over time. DesignLarge-scale meta-research study Data sourceCochrane Database of Systematic Reviews (search date April 8, 2025) Eligibility criteriaCochrane systematic reviews assessing interventions reporting "Summary of findings" tables. Data extractionData were automatically extracted using web scraping and a large language model, with q...
Show abstract
BackgroundThis paper details initial testing of the agreeability and usability of a novel quality appraisal tool for systematic reviews of prognostic factor studies: AMSTAR-PF. MethodsFourteen appraisers each assessed eight systematic reviews using AMSTAR-PF. Their ratings for each question and each article were compared, with interrater, inter-pair and intrapair agreeability calculated using Gwets agreement coefficient. Time of use and time to reach consensus were also recorded. ResultsInterr...
Show abstract
BackgroundSystematic reviews are important for informing public health policies and program selection; however, they are time- and resource-intensive. Artificial intelligence (AI) offers a solution to reduce these labour-intensive requirements for various aspects of systematic review production, including data extraction. To date, there is limited robust evidence evaluating the accuracy and efficiency of AI for data extraction. This study within a review (SWAR) aimed to determine whether human d...
Show abstract
BackgroundInternationally accepted standards for systematic reviews necessitate assessment of the risk of bias of primary studies. Assessing risk of bias, however, can be time- and resource-intensive. AI-based solutions may increase efficiency and reduce burden. ObjectiveTo evaluate the reliability of ChatGPT for performing risk of bias assessments of randomized trials using the revised risk of bias tool for randomized trials (RoB 2.0). MethodsWe sampled recently published Cochrane systematic ...
Show abstract
Structured AbstractO_ST_ABSImportanceC_ST_ABSSystematic reviews (SRs) inform evidence-based decision making. Yet, many take over a year to complete, are labor intensive, prone to human error, and face reproducibility challenges; thus limiting access to timely and reliable information. ObjectiveTo validate a large language model (LLM)-based workflow (otto-SR) to automate three of the most labour intensive tasks in performing SRs: article screening, data extraction, and risk of bias assessment; a...
Show abstract
IntroductionAlgorithmic bias in systematic reviews that use automatic screening is a major challenge in the application of AI in health sciences. This article presents preliminary findings from the project titled "Identification, Reporting, and Mitigation of Algorithmic Bias in Systematic Reviews with AI-Assisted Screening: Systematic Review and Development of a Checklist for its Evaluation" registered in PROSPERO with the registration number CRD420251036600 (https://www.crd.york.ac.uk/PROSPERO/...
Show abstract
BackgroundFrom 2006 to 2019, Cochrane reviews could be designated "stable" if they were not being updated but highly likely to be current. This provides an opportunity to observe practice in ending systematic reviewing and what is regarded as enough evidence. MethodsWe identified Cochrane reviews designated stable in 2013 and 2019 and reasons for this designation. For those with conclusions stated to be so firm that new evidence is unlikely to change them, we assessed conclusions, strength of e...
Show abstract
ObjectiveWe investigated the use of systematic review automation tools by systematic reviewers, health technology assessors and clinical guideline developers. Study design and settingsAn online, 16-question survey was distributed across several evidence synthesis, health technology assessment and guideline development organisations internationally. We asked the respondents what tools they use and abandon, how often and when they use the tools, their perceived time savings and accuracy, and desi...
Show abstract
BackgroundThe rapidly accumulating scientific literature in HIV presents a significant challenge in accurately and efficiently assessing the relevant literature. This study explores the potential capabilities of using large language models (LLMs), such as ChatGPT, for selecting relevant studies for a systematic review. MethodScientific papers were initially obtained from bibliographic database searches using a Boolean search strategy with pre-defined keywords. From 15,839 unique records, three ...
Show abstract
ObjectivesTo examine changes in completeness of reporting and frequency of sharing data, analytic code and other review materials in systematic reviews (SRs) over time; and factors associated with these changes. DesignCross-sectional meta-research study. SampleA random sample of 300 SRs with meta-analysis of aggregate data on the effects of a health, social, behavioural or educational intervention, which were indexed in PubMed, Science Citation Index, Social Sciences Citation Index, Scopus and...
Show abstract
ImportanceSystematic reviews are time-consuming and are still performed predominately manually by researchers despite the exponential growth of scientific literature. ObjectiveTo investigate the sensitivity, specificity and estimate the avoidable workload when using an AI-based large language model (LLM) (Generative Pre-trained Transformer [GPT] version 3.5-Turbo from OpenAI) to perform title and abstract screening in systematic reviews. Data SourcesUnannotated bibliographic databases from fiv...
Show abstract
Meta-analysis is an established methodology for evidence synthesis. In practice, substantial heterogeneity often arises among studies, and random-effects models are widely employed as standard tools. However, in many cases of data synthesis, some studies exhibit markedly different characteristics from others, beyond the degree expected from statistical error, and may become influential outliers that affect the overall conclusions. Although outlier detection and influence diagnostic methods have ...
Show abstract
BACKGROUNDBias assessment is a crucial step in evaluating evidence from randomized controlled trials. The widely adopted Cochrane RoB 2, designed to identify these issues, is complex, resource-intensive, and unreliable. Advances in artificial intelligence (AI), particularly in the field of large language models (LLMs), now allow the automation of complex tasks. While prior investigations have focused on whether LLMs could perform assessments with RoB 2, integrating technologies does not resolve ...
Show abstract
ObjectiveTo empirically explore the level of agreement of the treatment hierarchies from different ranking metrics in network meta-analysis (NMA) and to investigate how network characteristics influence the agreement. DesignEmpirical evaluation from re-analysis of network meta-analyses. Data232 networks of four or more interventions from randomised controlled trials, published between 1999 and 2015. MethodsWe calculated treatment hierarchies from several ranking metrics: relative treatment ef...
Show abstract
BackgroundSystematic reviews require extensive time and effort to manually extract and synthesize data from numerous screened studies. This study aims to investigate the ability of large language models (LLMs) to automate data extraction with high accuracy and minimal bias, using clinical questions (CQs) of the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock (J-SSCG) 2024. the study will evaluate the accuracy of three LLMs and optimize their command prompts to enh...
Show abstract
IntroductionSeveral filters are routinely used to remove animal or nonhuman records in Ovid Embase, despite there being no performance data for them. The filters take different approaches in design. ObjectiveTo understand and compare the impact of 11 filters to remove animal or nonhuman records in Ovid Embase. To understand the indexing of relevant subject headings in Embase. MethodsTo assess filter performance, we screened and categorised 3,000 records as should be removed or should be reta...